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A Method for Simulating Correlated Non-normal Systems of Statistical Equations 

Real world data often fail to meet the underlying assumptions of normal statistical theory. 
Many statistical procedures in the psychological and educational sciences involve models 
that may include a system of statistical equations with non-normal correlated variables 
(e.g., factor analysis, structural equation modeling, or other complex applications of the 
general linear model). Monte Carlo techniques are used to test the appropriateness of 
statistical procedures when the underlying assumptions of these procedures are violated. 
There is a paucity of methods for generating systems of statistical equations in a simple 
an efficient manner. Thus, the focus of the current study is to derive a general procedure 
for simulating correlated non-normal systems of statistical equations with a focus on 
computational efficiency. The procedure allows for the systematic control of correlated 
non-normal (a) stochastic disturbance terms, (b) independent variables, and (c) dependent 
and independent variables within a system. A numerical example is provided to 
demonstrate the procedure. The results of a Monte Carlo simulation are provided to 
demonstrate that the proposed method generates the desired population parameters and 
intercorrelations. 
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A Method for Simulating Correlated Non-normal Systems of Statistical Equations 

1. Introduction 

It has been documented that data sets in the psychological and educational 
sciences often violate the usual parametric assumptions underlying normal curve theory 
(Blair, 1981; Bradley, 1968, 1982; Micceri, 1989; Pearson & Please, 1975). Further, 
many variables of interest (e.g., reaction time) are intrinsically non-normal (Miller, 1988; 
Zumbo & Coulombe, 1997). In view of these concerns, behavioral and other applied 
researchers have relied on the results of Monte Carlo studies to aid in the proper 
application of various statistical techniques. For example, Monte Carlo methods may be 
used to compare the small sample properties of a test statistic with its competitor(s) or 
whether these properties are consistent with the statistic’s asymptotic approximation 
(e.g., Headrick & Rotou, 2001; Headrick & Sawilowsky, 2000). 

With the advances made in quantitative methods, and the availability of the 
modem desktop computer, Monte Carlo methods are widely applicable to many areas of 
statistical research. As a result, more sophisticated methods of simulating data have 
become available for conducting Monte Carlo studies. For example, Markov chain Monte 
Carlo methods (e.g., the Gibbs or slice sampler, Gelfand & Smith, 1990; Robert & 
Casella, 1999) are commonly used to generate posterior distributions to carry out 
Bayesian analyses in the context of learning or item response theories (e.g., Albert, J. H., 
1992; Verguts & De Boeck, 2000). Other applications of modem computer intensive 
techniques include: the method of approximate bootstrap confidence (ABC) intervals 
(Efron & Tibishirani, 1998); network-based direct Monte Carlo sampling in the context 
of conditional logistic regression for evaluating drug withdrawal symptoms (Mehta, 



Patel, & Senchaudhuri, 2000); and likelihood inference with missing data (Gilks, 
Richardson, & Spiegelhalter, 1998). 

With this plethora of uses for Monte Carlo methods in mind, there may be 
occasions when it is desirable to investigate the properties of statistics that involve 
systems of statistical equations under a variety of conditions. For example, latent factor 
theory of intelligence (e.g., Spearman, 1904) was the impetus for the development of 
factor analytic methods. The subsequent generalization of factor theory (Thurstone, 1940) 
resulted in many applications in the social sciences including its common use in the 
development and validation of psychometric scales (e.g., personality measures). 

However, the theory underlying these applications was based on normal curve theory that 
also included the assumption of uncorrelated disturbance terms. 

Developments in structural equation modeling (SEM) allow researchers to model 
correlated disturbance structures. Most SEM software packages use maximum likelihood 
estimation (MLE) procedures as a default option. Unfortunately, MLE has been 
demonstrated to be extremely sensitive to departures from multivariate normality 
(Muthen & Kaplan, 1985). For example, suppose the simple factor analysis model 
depicted in Figure 1 represents a two-factor model of intelligence where the two latent 
factors (e.g., performance and verbal intelligence) are assumed to follow normal 
distributions. In practice, the six manifest variables (F ’s) used to measure these 
constructs may not have normal distributions. Further, the errors from these six tests may 
also be non-normal and correlated due to response bias. Thus, the factor analytic model in 
Figure 1 is an example of a system of equations where it may be desired to simulate (a) 




S 



correlated non-normal manifest variables (Y's), (b) correlated latent variables having 
normal distributions ( X ’s), and (c) correlated non-normal error terms ( f ’s). 

More generally, consider the p-th equation from a system of T equations as 
follows: 



y. B 






( 1 ) 



where and have dimension (.Vxl), is (Nx.kp), is (^pXl),and is a 

real positive scalar. Combining all T equations of the form in (1) yields a linear system 
that can generally be expressed as: 



y = xp -I- o£ , (2) 

where y and e have dimension (TWxl), x is (TNxK), pis (Xxl), where 



K = kp , and o represents T scalars associated with each of the T equations. The 

stochastic disturbances (e ) in (2) are also assumed to have expected values of zero and 
unit variances. 

If the disturbance terms in (2) are contemporaneously correlated (e.g., is 

correlated with e^ ) then a gain in efficiency can be achieved by testing the system jointly 

using the method of generalized least squares (GLS) (e.g., Judge, Hill, Griffiths, 
Lutkepohl, & Lee, 1985, p. 447). This approach of joint estimation is perhaps better 
known as “seemingly unrelated regression equation estimation” (Zellner, 1962). 
Moreover, the stronger the correlations are between the disturbance terms (or the weaker 
the correlations are between the independent variables) within (2), the greater the 
efficiency of GLS relative to ordinary least squares (OLS) (Dwivedi & Srivastava, 1978). 
Thus, it may be desirable to study the relative Type I error and power properties of the 
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OLS and GLS estimators under non-normal conditions. Such an investigation would 
usually be carried out using Monte Carlo methods. To determine any advantages of GLS 
relative to OLS, a variety of non-normal distributions with varying degrees of correlation 
between the disturbances would usually be included in the Monte Carlo study. 

Generalized linear models (GLMs) and nonparametric tests have been suggested 
as alternatives to OLS when the stochastic disturbance populations are non-normal. In 
terms of GLMs, the disturbances can be specified to be some known continuous 
distribution functions (e.g., exponential) (Hilbe, 1994). As such, statisticians conducting 
Monte Carlo studies could investigate the validity and robustness of test statistics for 
various GLMs, including OLS regression (a GLM with an identity link and normal 
conditional distributions), when the error distributions have been misspecified. For 
example, suppose the disturbance populations in (2) follow a gamma distribution with 
parameters that yield typical fan-shaped heteroscedastic patterns. Under these conditions, 
one could compare OLS and nonparametric (rank) regression procedures with a GLM 
using various gamma distributions specified as the stochastic disturbance vector in (2). 
Additionally, if the disturbances in (2) are non-normal and correlated, then a GLM with 
generalized estimating equations (GEEs) could also be included in the study. 

G F.F. S have gained considerable attention as a technique for analyzing data with 
dependent non-normal errors (Liang & Zeger, 1986). In terms of the error component in 
(2), the GEEs are GLS matrices that model their correlational structures while the GLM 
specifies the distributional shapes of these terms (Horton & Lipsitz, 1999). Other 
statistical analyses that may involve systems of statistical equations in Monte Carlo 



studies include: heirachical linear models; time series analysis; and other applications of 
the GLM. 

Most statistics textbook authors discuss the validity of linear models or test 
statistics in terms of the various assumptions concerning the stochastic disturbance 
populations (e.g., Cook & Weisberg, 1999; Neter, Kutner, Nachtsheim & Wasserman, 
1996). For example, the usual OLS regression procedure has the assumptions that the 
stochastic disturbance terms be independent and normally distributed with conditional 
expectations of zero and constant variances. As such, in order to examine the properties 
of systems of statistical equations using Monte Carlo methods, it is necessary to have an 
appropriate data generation procedure that would allow for the a priori specification of 
the distributional shapes and correlation structures of the stochastic disturbance 
populations (such as the vector e in equation 2). Further, it is desirable that this 
procedure be both efficient and general enough to enable the simulation of a variety of 
statistical problems that may arise e.g., autocorrelation, multicollinearity, non-normality, 
unequal variances or regression slopes, and other violations of assumptions. 

There are procedures that will simulate correlated non-normal distributions (e.g., 
Headrick & Sawilowsky, 1999; Vale & Maurelli, 1983). These procedures are able to 
generate k correlated non-normal variables for a single equation as in (1) or a system of 
independent equations. However, these procedures have limitations with respect to their 
ability to generate a system of correlated non-normal statistical equations. This can be 
demonstrated by considering six non-normal distributions X, ,..., Xg generated with zero 

means and unit variances from the Headrick and Sawilowsky (1999) procedure where 
each of the X’s has a unique non-normal distribution and a unique pairwise correlation 
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with the other variables. To form a system of two regression equations, let X, and 
represent the two dependent measures with Xj and X 4 as the independent variables for 
the first equation and Xj and Xg as the independent variables for the second equation. 
Further, let m, and represent the resulting error terms from the OLS regression of the 
dependent measures on the independent variables for this system. 

This approach to creating a system of statistical equations presents a problem if 
control is desired over Mj and terms of their distributional shapes and correlation. 
Specifically, the resulting skew, kurtosis, and correlation between the error terms would 
be difficult to analytically determine. Moreover, the variances of m, and would also 
be unequal. These problems are exacerbated when more equations are introduced into the 
system, which may also include various specified degrees of correlation and non- 
normality. 

Another problem with respect to the Headrick and Sawilowsky (1999) and Vale 
and Maurelli (1983) multivariate power methods is that these procedures are extensions 
of the Fleishman (1978) univariate power method. As such, not all non-normal (marginal) 
distributions can be generated in terms of the various possible combinations of skew and 
kurtosis. For example, the lower boundary point of kurtosis for symmetric distributions is 
-1.15132 (Headrick & Sawilowsky, 2000). Thus, it is not theoretically possible to 
simulate a (rectangular) uniform density using these procedures. (See Headrick & 
Sawilowsky, 2000, for a more complete discussion on the derivation of the boundary for 
the Fleishman coefficient model.) 

To circumvent the aforementioned problem, Headrick (2000) derived a 
polynomial transformation for the power methods that allows for the additional control of 



the fifth and sixth moments. As a result, a much larger family of distributions is possible 
to simulate in terms of the available combinations of skew and kurtosis. Further, simple 
families of distributions also exist in terms of various combinations of fourth and sixth 
standardized cumulants. 

2. Purpose of the Study 

There is a paucity of methods for simulating systems of correlated statistical 
equations that enable the evaluation of more modem sophisticated statistical procedures 
as described above. Thus, the purpose of this study is to derive a general procedure that 
allows for the generation of systems of statistical equations with correlated non-normal 
distributions with the least amount of difficulty. More specifically, the objectives are to 
(a) extend the Headrick (2000) polynomial transformation to develop a method that 
generates systems of statistical equations with correlated non-normal dependent and 
independent variables and correlated non-normal stochastic disturbance terms, and (b) 
provide Mathematica (Wolfram, version 4.0, 1999) notebooks (available from the first 
author) that solve for power constants, intermediate correlations, and slope coefficients 
for implementing the procedure. 

The procedure is derived generally for simulating a system of T equations. A 
numerical example is subsequently provided to demonstrate the procedure. The results of 
a Monte Carlo simulation are also provided to demonstrate that the proposed procedure 
generates the desired population parameters and intercorrelations. 

3. Mathematical Development 

Consider the T equations in (2) more explicitly as follows: 
y, = + A.^.i + • • • + + • • • + Pxj^M + ■ • • + > (3a) 
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(3b) 





(3d) 



The proposed procedure creates from the right-hand sides of (3a) 

through (3d). That is, for all T equations, the Y variables are linear combinations of 
randomly generated X and e terms. Because each of the disturbance terms (£^) has unit 
variance, the scalar terms ( <r ) are included to allow for the creation of equal or unequal 
variance conditions. It should also be noted that it is not necessary for each of these 
equations to have the same number of independent variables. (Eliminating a particular 
independent variable in the system can be accomplished by setting its associated slope 
coefficient to zero.) 

Because the independent variables and stochastic disturbance terms in (3a) 
through (3d) are generated and correlated in the same manner, we have selected the 
independent variables and X in (3b) to refer to in deriving the proposed method. 

Correlations between and or between and X^. in (3b) and (3c) can be created 

in an analogous manner with the method presented below. 

More specifically, the X and X in (3b) and 3c) are generated and correlated 

using the fifth-order polynomial transformation derived by Headrick (2000, Equation 16) 
as follows: 




(4a) 
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(4b) 



where X',. and ~ N(0,1). The constant coefficients in (4a) and (4b) are 

determined by simultaneously solving equations (37) through (42) from Headrick (2000, 
Appendix 1) such that X^,. and X^j have zero means, unit variances, and the third (^'i ), 

fourth ( Y 2 ), fifth ( ), and sixth ( y ^ ) standardized cumulants from desired probability 
density functions. 

The values of X',. and X'^ in (4a) and (4b) that are used to create the 
independent variables ( X^,. , X ) in (3b) are correlated at an intermediate level 
according to the following Lemma: 

Lemma 1. Let be real-valued where |r^, |e [O, l] pi^po,pk > and let Z; , V, ~ 

iid N(0,1). Further, let = r^^Z, + V^l-rpQ , where ? = 1 if r^o < t=0 if r^o ~ 1- 

If x;, = vZ... ^"*1 Kj = ^Z, +W„^l^, then x;, and - N(0,1), 

with correlation of p , , - r^f.r ,r , when t-l, and p - r .r . when t=0. In 

XpiXpj " " X piX pj r rJ 

particular, note that p , , -r ^ when t- r = r = 1, and p - when r = r 

XpiXpj p PJ XpiXpj t' 

and for when t=0. 

Proof. The result is a consequence the proof of Lemma 1 from Headrick and Sawilowsky 
(1999). 

Note that Lemma 1 enables, for example, the i-th independent variable for each of 
the T equations in a given system to be the same. This can be shown, from a more general 
use of Lemma 1, as a special case of where r^o ~ V “ ~ 
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The intermediate correlation between X' , and X' A p , , ) in Lemma 1 is 

r rJ X pfX pj 

determined by setting the following equation from Headrick (2000, Equation 26) to the 



desired post-correlation between and X^jip 



^ pi ^ pj 



) in (4a) and (4b): 




The purpose of correlating X'. and X' at the intermediate level, p , , , is to control 

for the non-normalization effect of the constants in (4a) and (4b) such that the resulting 
Xp, and X pj have the desired post-correlation. 

Using Lemma 1 more generally, the following theorem can be stated: 

Theorem i. If X^, , X , X^, , X ^ . have zero means and unit variances from 

equations of the form in (4a) and (4b), specified correlations Px^^Xpi > Px,iX^ > Px^iX^^ > 

Pg g from equations of the form in (5), cov(f ^ , X^,. ) = 0 , and cov(f^ , X^,. ) = 0 , V,.^, , 

then the following correlations hold with respect to (3b) and (3c): 



^ pi pjPx, 



( 6 ) 





( 7 ) 
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Pyv = 






,K + E ^PI +22 P»0,iPy,x, y, + Z + ^'Z^r^,lf’y,‘. 

V pi PJ*P‘ V 9' 



, and (8) 






^9 . l^P Z! ^pi ^ Z! P piP Pi 

V pi pj^pi 



(9) 



Proo/. See Appendix 1. 

Thus, given exogenous correlations between the independent variables ( )» 



the pairwise correlation(s) between the dependent variable (Y^) and the independent 
variables (X ^.) are determined by simultaneously solving a system of k equations (of the 
form in equation 6) in terms of k unknowns ( ) for the desired correlation(s) of Py^x,,, ■ 
The correlations Pw , Pyy , and Py can subsequently be determined by evaluating 

(7), (8), and (9) using the specified correlations and slope coefficients from (5) and (6). 

4. Numerical Example 

Suppose it is desired to generate a system of three equations of the form in (3b). 
For k=l independent variables, where each disturbance term has unit variance, the 
equations are listed as follows: 

y, =Ao+Ai^11+A2^12+^^1^1. (10) 

Y2 — P20 ^ + 1^2^2 > and ( 11 ) 

1^3 =Ao +Al ^31+^2^32 +0-3^3. (12) 

where var(t 7 ,.f,.) = tJ,^(l) = 1 , V,.^j2,3. 

Letting r=0 in Lemma 1, for this example, the algorithms to generate correlated normal 




deviates are as follows: 



( 13 ) 



(14) 

(15) 

(16) 

(17) 

(18) 



Similarly, the algorithms to generate the correlated errors based on Lemma 1 are: 



e; = rX+wJl-r,\ 


(19) 


e '2 = rX\ + W 2 -Jl “ ^ 2 ^ . and 


(20) 


eXrX+W2^|^-r2 • 


(21) 



(In accordance with Theorem 1, note that Zj and Z,\ are independent.) The resulting 

£•' in (13) through (21) are normally distributed with zero means, unit variances, 

and have the intermediate correlations according to Lemma 1. 

Given this general framework, suppose it is desired to approximate the following 
correlated non-normal X ’s and £ ’s in (10), (11), and (12): a) Xjj = X^i = X^^ = 
exponential, b) X [2 = X ^2 = =double exponential, and c) =Cauchy, 

where Px„x„ =- 10 > = - 35 , Px„x,;, =.10-, =- 40 . ^ Py^x^ 

= .50, PYyX^^ ~ Py}X }2 ~ -60, and Pe,ey ~ Pe^Ey ~ PsyEy — •40. 

The following steps are taken: 



X 

11 


+ Wjl-r,], 


^\2 “ ^12^1 


+ W„^l-r,l 


-^21 “ ^21^1 


+ wJi -4 


-^22 “ ^22^1 


^22 ” ^22 


^"31 “ ^31^1 


. xxr L 2 


-^32 “ ^32^1 


+ W„yll-4 
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1. Obtain the constants to generate the standardized non-normal distributions (see 
Headrick, 2000, pg. 57). The standardized cumulants and constants are listed 
below in Table 1. 

2. Determine the values of r ^^ , r^, and r, to use in (13) through (21). This 
is accomplished using (5) by setting the left-hand side to the desired post- 
correlation (/?-. y =.10; Py y =.35; Py y =.70) and substituting the 

'«r^A|iAi2 * ^1X^21 '-^31-^32 ^ 

constants into the right-hand side and then solving for these values. Accordingly, 
this yields values of r,, =.350171, =-321446, =.657772, r ^2 =.597454, 

rj, = .937894, = .831441, and T; = = .749509. 

3. Obtain the slope coefficients to use in (10), (11), and (12). This is accomplished 
by simultaneously solving two equations of the form in (6) for each of the desired 
post-correlations {Py v )■ The coefficients are as follows (see Appendix 2): 

=0.431834, P 2 , = P 21 =0.466760, and P^, = =0.464851. 

4. Substitute the values of r,, ,.. ., , r^, r^, and obtained from step (2) into 

equations (13) through (21) to generate X[^, ^21 > ^22 > ^2 i ^31’ 

X32, . The intermediate correlations are Px\Xm~ ^11^12 --112561, Px'^^x'-^ ~ 

V 21 =-392989, = .779803, and = 

rjTj = = .561764. 

5. Substitute the values of X [^ , X'^, e[\ X^i , X22, £'2 \ and X[i , X32, from 
step (4) into equations of the form in (4a) and (4b) to generate the non-normal 



deviates X,,, X,^, f, ; X^, , X^j, £ 2 ',s^r\d X3,, X32, with the desired post- 
correlations. 

6. Substitute X„ , X,^ , f, ; X^, , X^^ , ; and X3, , X32 , e^ from step (5) and the 

slope coefficients from step (3) into (10), (11), and (12) to generate the values of 
y, , Yj, and 73 with the desired post-intercorrelations. 

Given the specified correlations from above, the correlations between the 
dependent variables are follows (see Appendix 2): = .369251, Pyy^ = .393505, and 

Py^y^ = .508267. The correlations between Py^^„ example, are as follows 

(see Appendix 2): Py^x„ =141997 and =.138445. 

5. Monte Carlo Simulation 

To evaluate the proposed procedure, the population parameters p, cr^ , Y 2 , 

Yj, Y 4 , and the correlations between and within equations (9), (10), and (11) from the 

numerical example in the previous section were simulated using an algorithm coded in 
Fortran 77. The algorithm used subroutines NORMS 1 and UNIl from RANGEN (Blair, 
1987) to generate pseudo-random normal and uniform deviates. Independent sample sizes 
of A= (10,10,10), (100,100,100), (1000,1000,1000), and (10000,10000,10000) were 
generated for simulating the specified standardized cumulants and various correlations. 
Values of all standardized cumulants and correlations were calculated for each repetition 
and then averaged across 50,000 repetitions. Thus, the average values of pip), cr^ 

(O'"), y,(y,), Y 2 i? 2 \ Y 3 (f 3 X YiiYi), Px,,X,,(Px,,xJ^ 

)> Py^Xu (Py,Xu^' ^r,x,2 ( /^r,x,2 )> Py-^x^^'^Py^x-^^)' /^r^x^ ( )> 
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)’ P^.H ( P^.^^ )> P^.H ( ^^.^3 )> ^^ 3^3 ( ^^ 3 ^ 3 ) computed were based on 

//X 50,000 random deviates. The average values of the population parameters are reported 
in Table 2 and Table 3. 

Inspection of Tables 2 and 3 indicate that the proposed procedure generated 
average values of the population parameters that were in close agreement with their 
respective population parameters. In terms of the simulation results concerning the 
average values of correlation, the procedure produced excellent agreement between these 
values and their associated population parameters even for sample sizes of N= 10. 

6. Discussion 

The simulation results from the previous section indicated that the proposed 
procedure generated the desired population parameters and specified intercorrelations. It 
is also beneficial to provide visual representations of some of the distributions generated 
by the Headrick (2000) procedure. Specifically, illustrated in Figure 2 are the relative 
frequency histograms of the approximations of the exponential and double exponential 
pdf%. Inspection of panels A and B in Figure 2 reveals that the proposed procedure 
approximated the considered theoretical densities very well. 

The proposed procedure is also useful for generating other systems of equations 
based on the GLM. For example, the method could be used to generate T independent 
equations to investigate the statistical properties of competing nonparametric tests in the 
context of analysis of covariance (ANCOVA) or repeated measures. 

With respect to ANCOVA, one attractive feature of the proposed method is that it 
has an advantage over other competing procedures (e.g., Knapp & Swoyer, 1967) to the 




extent that it allows for the creation of populations with unequal regression slopes while 
maintaining the between-group equal variance assumption. This can observed by 
inspecting equations (3b) and (3c) where the slope coefficient(s) could change (i.e., made 
unequal) while the error terms remain unchanged. Subsequent to any changes made to the 
slope coefficients, the variate and covariate correlations can then be determined from 
equation (6). Thus, this feature of the proposed method would be a remedy to the 
problem with the algorithms used to correlate data in the Monte Carlo studies by 
Hamilton (1976) andPeckham (1968). Specifically, the algorithms used in these studies 
were unable to simulate the unequal slope condition without also simultaneously 
violating the between-group equal variance assumption. (See Rogosa, 1980, for a 
discussion on the validity of the Hamilton, 1976, and Peckham, 1968 studies with respect 
to this issue.) 

Many other applications of the proposed procedure to the GLM are possible. 

From the GLM perspective, the dependent variables (Y^) could represent the same 

variable collected under T different conditions or at time points 1,...,T. In either case, the 
independent variables ( ) could represent static covariates (e.g., pre-existing ability 

measures often used in ANCOVA models). Thus, using the special case of Lemma 1 
noted above, the i-th independent variable in each of the T equations would be the same. 
Conversely, the independent variables ( ) could also be different for each of the T 

equations and may be used to represent time-varying covariates (i.e., some variables 
measured over the T periods along with Y^). There is also the possibility of using a 

combination of both static and time-varying covariates ( X ^. ). As such, a variety of 
analytic procedures are comparable under this data generation process. 
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The proposed method would also allow the generation of repeated measures data 
of nonspherical structures with non-normal disturbances and non-normal covariates. It 
should also be noted that with non-normal stochastic populations one could specify all of 
these distributions to be the same. This assumption is implicit in parametric analyses of 
repeated measures data because normal disturbance populations are implied. Likewise, 
robust estimators such as the rank-based shift model (Lehmann, 1998) also assume that 
disturbance populations have identical (but not necessarily normal) shapes. 

By contrast, fully nonparametric hypotheses (e.g., Akritas & Arnold, 1994) make 
no assumptions about the distribution of the stochastic disturbance populations and 
therefore do not require the conditional (disturbance) populations to have identical 
variances or distributional shapes. Fully nonparametric hypotheses differ from robust 
estimators because they do not attribute the rejection to location parameters alone but 
rather to any distributional differences, a concept recently referred to as “stochastic 
heterogeneity” (Varga & Delaney, 1998). Thus, hypotheses of this form reduce the risk 
of drawing incorrect conclusions about the likely sources of a statistically significant 
result, but do so at the cost of not being able to characterize precisely how population 
distributions differ (Serlin & Harwell, 2001). 

The proposed method would also be useful in the context of time series analysis. 
Specifically, the procedure could be used to model instrumental variables which address 
one of the problems with certain autoregressive models (e.g., the adaptive expectations 
model) where the dependent measure from a preceding time period (T^ ) is included as a 

stochastic independent variable in the subsequent (^-th) period. As such, the vectors 
and are usually correlated. Using the proposed method, a Monte Carlo study could be 



arranged to simulate non-normal “proxies” correlated at various levels between and 
. The correlations can be determined from equation (9). 

Application of the proposed method to the GLM is flexible and has the potential 
to simulate other types of models where disturbance populations change over time. For 
example, when data sets with repeated measures have mistimed measures or missing 
data, the GLM with GEEs and hierarchical linear models are often considered preferable 
to the usual OLS univariate and multivariate procedures. 

It may also be reasonable to consider a model where the correlation structure is a 
function of time between the observations. Thus, data could be also be generated for 
Monte Carlo studies involving dynamic regression models that have distributed lags or 
moving averages. 

7. Conclusion 

Systems of linear statistical equations with correlated non-normal variables are 
widely applicable in many experimental situations. Some examples include confirmatory 
factor analysis, hierarchical linear models, time series analysis, and other applications of 
the general linear model (e.g., analysis of covariance, repeated measures). Thus, the 
concern of the present study was to develop a procedure that enables the simulation of 
systems of statistical equations. The procedure allows for the creation of systems with 
non-normal variables and specified intercorrelations between a) dependent variables, b) 
independent variables, c) dependent and independent variables, and d) stochastic error 
terms. The results of a Monte Carlo simulation indicate that the proposed procedure 
generated the desired population parameters and specified intercorrelations. 
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TABLE 1. Values of Cq,...,c^ that were used to simulate the desired non-normal 
distributions^. 



Dist. Co 


Cl 


C 2 


C 3 


C 4 


^5 


1 -0.307740 


0.800560 


0.318764 


0.033500 


-0.003675 


0.000159 


2 0.000000 


0.727709 


0.000000 


0.096303 


0.000000 


-0.022320 


3 0.000000 


-0.159685 


0.000000 


0.355036 


0.000000 


-0.009473 


“The three distributions are described as follows: (1) approximate exponential =2, 


Vi =6, /3= 24, /4 


= 120 ); (2) approximate double exponential (y^ =0, 


^2 = 3, 


Vi = 0> Vi - 30 ); and (3) approximate Cauchy =0, ^2 


= 25, y, = 0, 


^4 =1500) 



TABLE 2. Values of average correlation from the simulation^. 



Population 

Correlation 


Average 

Correlation 


V=10 


0 

II 


V=10^ 


V=10" 


.4000 


PyxXn 


.3999 


.3998 


.3998 


.3999 


.4000 


PyxXn 


.3998 


.4000 


.4001 


.4000 


.3935 


Py.y. 


.3933 


.3933 


.3931 


.3934 


.5000 


PyiXn 


.4998 


.4998 


.4999 


.5000 


.5000 


Py^Xn 


.5003 


.5000 


.5001 


.4999 


.3693 


Py^yi 


.3689 


.3690 


.3690 


.3692 


.1420 


Pyi^n 


.1421 


.1420 


.1421 


.1419 


.1384 


Py^^M 


.1383 


.1383 


.1385 


.1384 


.6000 


Py^x^, 


.5994 


.5995 


.5995 


.5999 


.6000 


PyyXyi'' 


.5999 


.5999 


.6000 


.5999 


.5083 


Py.h 


.5083 


.5082 


.5082 


.5083 


.1000 


Px„x,, 


.0999 


.0999 


.1000 


.1000 


.3500 


Px^,Xn 


.3501 


.3500 


.3502 


.3499 


.7000 


P X 31X32 


.6995 


.6995 


.6996 


.6999 


.4000 


P^x^l 


.3993 


.3997 


.3995 


.4000 


.4000 


Pe^Ey 


.3992 


.3999 


.3994 


.4000 


.4000 


Pe^e^ 


.3997 


.3999 


.4000 


.4000 


“The population parameters for the variables are: (1) X^^ 


= ^21 = ^31 = 


(ri= 2 . 


=6,^3=24,^4=120); ( 2 ) A 


— Y - 
12 “ ^ 22 “ 


^^32-(r.= 


^ 0 , y^^ 3, ^3 


= 0, = 30 ); 


and (3) £^- 


= £3= (ri =0, ^2 


= 25, 


= 0, ^4 =1500). 
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TABLE 3. Average values of mean( ft ), variance( ), and the third( Y\ ), fourth( Yj )- 
fifth( fj ), and sixth( ) standardized cumulants from the simulation^*. 





Xu 


X,2 


ei 


X21 


X22 


^ 2 


X3, 


X32 


ez 


N=10 


M 


^0.00004 


-0.00038 


-0.00021 


0.00010 


0.00007 


-0.00008 


-0.00034 


-0.00001 


-0.00008 




0.99293 


1.00042 


0.99736 


0.99928 


1.00083 


0.99856 


0.99872 


0.99998 


1.09866 


fi 


1.99887 


-0.00352 


-0.00480 


1.99949 


0.00023 


0.00450 


1.99901 


0.00101 


-0.00170 


Vi 


5.95550 


3.00977 


24.8245 


5.95497 


3.02343 


24.8869 


5.96749 


2.99814 


24.7938 


?3 


23.8746 


-0.07745 


-0.18880 


23.8247 


-0.00188 


0.81641 


23.9470 


0.01741 


-0.55553 


Ya 


1 1 r» /r /in 
i ly.vHy 


30.4301 


1474.255 


110 

i iO.JOO 


n 1 >1 

Ji.ZH-OJ 


1484.101 


1 1 1 

i ly.iJZ 


nn ny'on 

zi/.yuoz 


1 Ainc c c 1 

IH-/ J.JJ i 


N=\Qp- 


P' 


-0.00005 


-0.00008 


-0.00007 


0.00002 


-0.00012 


-0.00001 


-0.0001 


-0.00001 


-0.00001 




0.99921 


1.00020 


0.99960 


0.99918 


1.00005 


0.99973 


0.99891 


0.99990 


1.00975 


Y, 


1.99896 


-0.00129 


-0.00321 


1.99860 


-0.00008 


0.00633 


1.99785 


0.00040 


0.00433 


Yz 


5.99765 


3.00426 


24.9644 


5.98813 


3.00426 


24.9696 


5.98946 


3.00009 


24.9383 


Yz 


24.0212 


-0.03029 


-0.38283 


23.8938 


0.00304 


0.10304 


23.9333 


-0.02657 


0.09315 


Ya 


120.265 


30.2585 


1481.675 


118.601 


30.2260 


1495.552 


119.097 


30.0729 


1492.868 


N=10^ 


P 


-0.00007 


-0.00018 


-0.00024 


0.00010 


0.00003 


-0.00013 


-0.0001 


0.00002 


-0.00013 




0.99915 


1.00033 


0.998555 


0.99938 


1.00042 


1.00030 


0.99885 


1.00026 


1.00128 


Yi 


1.99811 


-0.00242 


-0.00200 


1.99967 


0.00012 


0.00341 


1.99809 


0.00029 


-0.00275 


Yz 


5.99076 


3.00629 


24.8942 


5.99425 


3.01366 


25.0105 


5.99569 


3.00495 


24.9178 


Yz 


23.9727 


-0.06022 


-0.00207 


23.9342 


-0.02068 


0.09304 


24.0306 


-0.00248 


-0.07326 


Ya 


120.061 


30.2127 


1486.226 


118.974 


30.1845 


1502.074 


120.518 


30.1534 


1493.898 


N=10^ 


P 


-0.00003 


-0.00003 


-0.00002 


0.00000 


-0.00002 


0.00000 


-0.00001 


0.00001 


0.00000 




0.99984 


0.99997 


0.99996 


0.99985 


1.00002 


0.99990 


0.99908 


1.00000 


0.99999 


Y, 


1.99902 


-0.00025 


0.00050 


2.00030 


-0.00024 


-0.00020 


1.99803 


0.00016 


0.00201 


Yz 


6.00722 


2.99966 


24.9951 


5.99153 


3.00039 


24.9932 


5.99649 


2.99952 


25.0004 


Yz 


23.9842 


-0.00484 


0.01022 


24.0127 


-0.00897 


0.06241 


23.9822 


-0.00160 


0.19972 


Ya 


120.016 


29.9999 


1499.177 


119.971 


30.0030 


1499.134 


119.993 


29.9869 


1499.332 



“The population parameters for the variables are: (1) Xu = X^i = X 3 , = (Yi =2, Yz 
Y,=24 ,y,=120);(2) X,, = X,, = X,, = (y, =0, ^2 = 3, ^ 3 = 0 , ^4 = 30 ); and (3) 
£^ = = £j= (Y^ = 0 , Yz= 25, Y3 =0- V4= 1500). 
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FIGURE 1. A two factor confirmatory factor analysis model with cross-loadings and 
correlated error terms (f ’s). The parameters p and 6 denote correlations between the 
X’s and between the e ’s. 




A) Exponential B) Double Exponential 

FIGURE 2. Approximations of the exponential and double exponential pdf’s generated 
by the Headrick (2000) polynomial transformation from equations (4a) and (4b). The 
constants used to simulate the densities are listed in Table 1. The sample size used was 
10,000. For amenability to the standard exponential pdf, a constant of 1.0 was added to 
each value of X in panel A. , 




26 

29 



Appendix 1 



ERIC 



Without loss of generality, Theorem 1 can be shown from the independent use of 
Lemma 1 to create , X^,. ; and £^ , such that cov(fp,Xp,) = 0 and 



P' <7 



cov(f^ , X^, ) = 0 , ^ , and with random variables that follow a standard normal 



distribution. 



PROOF: p 



L[F^X^,]-E[yjL[X^,] 






{ {E{Y; ] - {E{Y^ ])0 X {E[X ] - (£[X ^, ])^ ) } 



2x,l/2 > 



( 22 ) 



£[y^x^,]-£[Fj£[x^,] 



{ (£[f; ] - (£[F^ ])0 X (£[x;, ] - (£[x ^, ])^ ) } 



2x,l/2 > 



(23) 



Py.y. = 



E[Y^Y^]-E{Y^]E{YJ 



{ {E[Yl ] - (£[F ])" ) X (£[7/ ] - (£[F ])")}'/" 



, and 



(24) 






q q ■ 



Pw.) { (£[V 2 J ])2 ) ^ (£[(tr,f , )" ] - {E[(7£Af)yl^ 



(25) 



Setting Ci„ = 1 and all other constants ( C(,„ , Cj.. , . . . , Cj.. ) to zero in equations of the form 
in (4a), (4b), and (5) gives the standard normal case where X = X',. , X = X'^. , 

= ^qi > ~ ^qj ' ^p~ ^q~ ^q' Px^^X^j ~ P X'^^X'^j > Px^^X^ ~ P X'^X'^ > Px^^X^, ~ 



Px'^.x-^ . and = p^y^ . Further, in Lemma 1, (3b), and (3c), let all X^, be a function 
of Z, and let all X^, be a function of ■ Similarly, let be a function of Z[ and 
be a function of , where cov(Zj,Z,') = cov(Z 2 ,Z 2 ) = co\(y,V') = 0. 

It follows from expressing Zj as a function of Zj in Lemma 1 that: 

£[X,J = r,,£[ZJ + £[H',,]Vi^=''po£[Z,l + £[Vl7r^) + £[H',,],/r^ =0, 

because £[Z, ] = E[V] = E[W^ ] = 0 . 
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It follows analogously that: 



Ela,s, ] = <7, (r._ £[z; ] + m., 1,/i^ ) 

= + £[tV,_ ]^l-r,‘) =0, because 

£:[z;] = £:[v'] = £:[w, ] = o. 



Define the variances of and <y„£„ as: 



var[X^, ] = E[Xl ] - {E[X^, , and (26) 

var[cr^f J var[f J = (r'^(E[e'^]-(E[ejy) , since cr^ is a constant scalar. (27) 



Applying Lemma 1 to both (26) and (27) gives: 



var[X^,.] = = E[r;y^-r^,r^y^ + + -r^- + 

^fpor,]VZ^ ^l-r^o + + ''pV.X' 1 “ 0 

var[f J = E[e^] = Eir^V'^ - rl^rlV'^ + 2r,V% - r^W,] + 

^£,0 ^1 - ~ r?, r£-,ore, ^1 1 ~ 0> 



(28) 



(29) 



because E[X^.] = £'[£'q] =0. 

Taking expectations in (28) and (29) yields: 
var[X^,] = £[Xj]=l,and 

varlf^] = E[e^^] = 1, because 

E[Z^] = E[Z{^] = ] = E[W^] = E[V^] = E[V'^] = 1, and 

E[Z, ] = E[Z[] = ] = E[W^, ] = E[V] = £[V'] =0. 

Thus, 

var[c7,^J=<T’£[£=l = <7'a) = c7,^ 



It follows from analogous arguments that: 



£[X^,] = £[fp]=0, 

var[X^,] = £[X^^,] = l,and 

v^[a^e^] = alE[el] = (jl{\) = al. 

Thus, 

E[Yp ] = Ppo and E[Y^ ] = - because E[e^ ] = E[e^ ] = E{X ^. ] = E{X ] = 0 , V,.^i . 



Hence, 



Eiy,y,i-(/g,.)(0) 



£[y,^„i-(>5,oXO) 



, and 






£[(y^)(rT^g,)]-(yg,o)(0) 



Expressing (Z 2 ) as a function of Z; (Z,^) from Lemma 1 yields: 



pi 



X/ = (A. +<T,(r,_.Z,' + V'/-r/_. + 

Qi 

=(/»,. +<^,('i,z:+w'.,/^)+Z^„<'-„z, +w„^))x 

pi 



(30) 

(31) 

(32) 

(33) 

(34) 

(35) 

(36) 
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= (/),c + (%oz: +v'^i- <0 + M'., + 



( 37 ) 



pi 

(A. + ^ ('■.,.2^ + + '^•. 



(38) 



(y, ) = (^,„ + (r,_ z; + IV,^ .f ^ ) + X (r„Z, + W„ 7i^ )) X 

pi 

<^,(''.,.z:+'''^/i^+M'../^)- 



(39) 



Expanding the right-hand sides of (34) through (39) and taking expectations gives: 

ElY;]=<ri+/il.+'Z0l+2-Z0,/i,P,,.,, (40) 



pi 



PJ^PI 



e[y;] = <tI^I31„+Y.PI *2Y,P,APw 

qi qj^qi 



pj^pl 



£[y,7f„i = EA.Px,».. 

qi 

E[Y/,] = +^po/^,o+l,l,^pi^niPx,x, . and 



pi qi 



E[iY^)(CT^8^)]^CT^P,^,^. 

Substituting (40) through (45) into (30) through (33) and simplifying gives the 
expressions in (6), (7), (8), and (9) as follows: 

^pi ^^pj*pi ^pjPx„x,j 

Py.x„: - 



(41) 

(42) 

(43) 

(44) 

(45) 



^ '^^pi^pjPx^iX^j 

V pi pj*pi 



o 
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33 



Py,x„ ~ 



'Tjqi^qiPx.^X,, 



^ S PvP<uPx,,X^ 

V 9' 'U*v 



Py.y. = 



^ P^nP^p^q '*' '^Lpi^Lqi^ P‘^P‘PXpiX^i 



J^P ■•■ S ^P< ^ S fipifipjPx^,X^j + X ^ S PqiPqjPx^,X 

V pi pj*pi V 9' V*P> 



(^pP 



P^PpPq 






J^P + Z ^P. + 2 X PpiPpjPx.x, 

V P' Pi*P' 



which completes the proof. 




= , and 

<U 
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Appendix 2 



For the desired correlations of ~ = .40, /Oy^Xj, ~ Py^Xi^^ =.50, and 



PYiX^i = Py,x^^ = .60, the slope coefficients and P^\ A 2 ! Ai Pii 

determined by simultaneously solving the following three independent sets of two 
equations for the unknowns ( , P ,2 )'■ 



.40 = 



.40 = 



.50 = 



A,+A2(io) 



P„+A,(M) 

Vi+A’+A2 + 2A,A2(10)’ 

A , i + A „(- 35 ) 

Vl ■*■ A21 + /^22 + (- 35 ) 



, and 



, and 



(46) 



(47) 



(48) 



.50 = 



y^,,+y^,,(.35) 



+ Pl+ Pl, + 2PM35)' 



(49) 



.60= 



A, +A!(-70) 



Vi+A.+A‘+2A,A2(™) 



, and 



(50) 



.60 = 



A , + A ,(- 70 ) 



V 1 + A.+A" +2A,A2(-70)’ 



( 51 ) 



which gives = y^jj =0.431834; y^^j = y^^j =0.466760; and y^ 3 j = y^jj =0.464851. 



Given the solutions for y^jj 
the correlations yOj,_j,^ , yOj,,,^ , , 

and (8) as follows: 



, y^i 2 , y^ 2 i , y^jj , y^ 3 i, andy^ 3 ^ from (46) through (51), 
PY^x^^ > Py^Xi^ determined from equations (7) 
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Py^y^ = .40+ (0.431 834)(0.466760)(. 197432) + (0.431 834)(0.466760)(. 187942) + 
(0.431834)(0.466760)(.185900) + (0.431834)(0.466760)(.185860)/ , (52) 

{(l + 2(0.431834)' + 2(0.431834)'(.10))x(l+ 2(0.466760)' + 2(0.466760)' (.35))}^ 

Py^y^ = .40 + (0.43 1 834)(0.46485 1)(.2873 19) + (0.43 1834)(0.46485 1)(.268 181) + 

(0.43 1 834)(0 .46485 1)(.258961) + (0.43 1 834)(0 .46485 1)(.258954) / , (53) 

|l + 2(0.431834)' + 2(0.43 1 834)' (.10))x(l + 2(0.464851)' + 2(0.464851)' (.70))}^ 



Py^y^ = .40 + (0.46485 1)(0.466760)(. 197432) + (0.46485 1)(0.466760)(. 187942) + 
(0.464851)(0.466760)(.185900) + (0.46485 1)(0.466760)(. 185 860)/ , (54) 

|l + 2(0.466760)' + 2(0.466760)' (.35))x (l + 2(0.46485 1)' + 2(0.46485 1)' (.70))}^ 

(0.466760)(. 197432) + (0.466760)(. 1 85960) 

Pyx — / / \ ? 

" " V(l + 2(0.466760)' + 2(0.466760)' (.35)) 



Py,x 



12 



(0.466760)(. 1 87942) + (0.466760)(. 1 85 860) 
7(1 + 2(0.466760)' + 2(0.466760)' (.35)) 



(56) 



Simplifying (52) through (56) gives: Py^y^ = .369251, Py^^ = .393505, Py^y^ 



= .508267, 



Py y =.141997, and yOy „ =.138445. 

r' ¥2X11 ^ ^ 2^12 

Note that the correlations in the numerators of (52) through (56) were determined 
by evaluating equation (5) from substituting the prespecified correlations from step 4 of 
the numerical example and constants from Table 1. For example, the correlation Px^x^, ~ 
.197432 in equation (52) was determined by evaluating (5) from substituting the values of 
r^^ =.350171, =.657772, and the constants from Table 1 representing the exponential 

and double exponential distributions. 
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